Search Result

Select

Medical insurance fraud detection algorithm based on graph convolutional neural network

YI Dongyi, DENG Genqiang, DONG Chaoxiong, ZHU Miaomiao, LYU Zhouping, ZHU Suisong

Journal of Computer Applications 2020, 40 (5): 1272-1277. DOI: 10.11772/j.issn.1001-9081.2019101766

Abstract （1067）

PDF （2297KB）（957）

Save

Aiming at the problems of insufficient fraud samples, expensive data labeling and low accuracy of traditional Euclidean space model, a new One-Class medical insurance fraud detection model based on Graph convolution and Variational Auto-Encoder (OCGVAE) was proposed. Firstly, a social network was established through patient visit records, the weight relationships between the patients and the doctors were calculated, and a 2-layer Graph Convolutional neural Network (GCN) was designed as the input of the social network data to reduce the data dimension of the social network. Secondly, a Variational Auto-Encoder (VAE) was designed to implement the model training under only one-class fraud sample label. Finally, a Logistic Regression (LR) model was designed to discriminate the data category. The experimental results show that the detection accuracy of the OCGVAE model reaches 87.26%, which is 16.1%,70.2%,31.7%,36.5%,and 27.6% higher than that of One-Class Adversarial Net (OCAN), One-Class Gaussian Process (OCGP), One-Class Nearest Neighbor (OCNN), One-Class Support Vector Machine (OCSVM) and Semi-supervised GCN (Semi-GCN) algorithm, demonstrating that the proposed model effectively improves the accuracy of medical insurance fraud screening.

Reference | Related Articles | Metrics

Select

Random forest based on double features and relaxation boundary for anomaly detection

HU Miao, WANG Kaijun

Journal of Computer Applications 2019, 39 (4): 956-962. DOI: 10.11772/j.issn.1001-9081.2018091966

Abstract （423）

PDF （1029KB）（372）

Save

Aiming at the low performance of existing anomaly detection algorithms based on random forest, a random forest algorithm combining double features and relaxation boundary was proposed for anomaly detection. Firstly, in the process of constructing binary decision tree of random forest with normal class data only, the range of two features (each feature had a corresponding eigenvalue range) were recorded in each node of the binary decision tree, and the double-feature eigenvalue ranges were used as the basis for abnormal point judgment. Secondly, during the anomaly detection, if a sample did not satisfy the double-feature eigenvalue range in the decision tree node, the sample would be marked as a candidate exception class; otherwise, the sample would enter the lower nodes of the decision tree and continue the comparision with the corresponding double-feature eigenvalue range. The sample would be marked as candidate normal class if there were no lower nodes. Finally, the discriminative mechanism in random forest algorithm was used to distinguish the class of the samples. Experimented results on five UCI datasets show that the proposed method has better performance than the existing random forest algorithms for anomaly detection, and its comprehensive performance is equivalent to or better than isolation Forest (iForest) and One-Class SVM (OCSVM), and stable at a high level.

Reference | Related Articles | Metrics

Select

Relative importance index of dummy variables in regression model

LI Haichao, WANG Kaijun, HU Miao, CHEN Lifei

Journal of Computer Applications 2017, 37 (11): 3048-3052. DOI: 10.11772/j.issn.1001-9081.2017.11.3048

Abstract （851）

PDF （819KB）（626）

Save

To describe the qualitative attributes in the regression model, it is usually necessary to introduce dummy variables. For the regression equation with dummy variables, a method was proposed to describe the different importance of the different dummy variables in the regression equation. The sums of square due to regression with dummy variables were descomposed, including the sum of the dummy variable part and that of non-dummy variable part, and the proportions of the two parts was calculated in the regression equation, and the proportion was taken as the index of relative importance of every dummy variable in regression equations. In sets of Lending Club and Prosper network with nearly 100 thousand lending data, the experimental results about the influence of the purpose of loan on the borrowing success rate and the influence of credit grade on the borrowing rate show that compared with the traditional regression equation which only provides a dummy variable coefficient and cannot shows its importance, the proposed method can show the importance of different dummy variables, and provide an important means to quantitatively analyze the influence degree of qualitative independent variables on the dependent variable in the regression equation.

Reference | Related Articles | Metrics